智能论文笔记

Recent NLP models have the great ability to generalise `zero-shot' to new tasks using only an instruction as guidance. However, these approaches usually repeat their instructions with every input, requiring costly reprocessing of lengthy instructions for every inference example. To alleviate this, we introduce Hypernetworks for INstruction Tuning (HINT), which convert task instructions and examples using a pretrained text encoder into parameter-efficient modules inserted into an underlying model, eliminating the need to include instructions in the model input. Compared to prior approaches that concatenate instructions with every input instance, we find that HINT models are significantly more compute-efficient and consistently outperform these approaches for a given inference budget.

translated by 谷歌翻译

Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems

Jack FitzGerald , Shankar Ananthakrishnan , Konstantine Arkoudas , Davide Bernardi , Abhishek Bhagia , Claudio Delli Bovi , Jin Cao , Rakesh Chada , Amit Chauhan , Luoxin Chen

分类：自然语言处理 | 人工智能 | 机器学习

2022-06-15

我们介绍了一个大规模实验，该实验对编码器进行了预处理，其参数计数范围从700m到9.3b不等，随后蒸馏到较小的型号中，范围为17m-170亿参数，其应用到自然语言理解（NLU）组件（NLU）组件（虚拟助手系统。尽管我们使用70％的口语数据训练，但在对书面形式的跨语性自然语言推论（XNLI）语料库进行评估时，我们的教师模型与XLM-R和MT5相当。我们使用系统中的内域数据对教师模型进行了第二阶段的训练，以提高了3.86％的相对分类，而相对7.01％的插槽填充。我们发现，即使是从我们的2阶段教师模型中提取的170亿参数模型，与仅接受公共数据的2.3B参数老师相比，与2.3B参数老师相比，意图分类更好2.88％，并且7.69％的插槽填充错误率更好（第1阶段），强调了。内域数据对训练的重要性。当使用标记的NLU数据进行离线评估时，我们的17m参数阶段2蒸馏模型的表现分别优于XLM-R碱基（85m Params）和Distillbert（42m Params），分别优于4.23％至6.14％。最后，我们介绍了一个完整的虚拟助手实验平台的结果，在该平台中，我们发现使用经过预训练和蒸馏管道训练的模型超过了从8500万参数教师蒸馏的模型，在自动测量全系统用户不满的自动测量中，从8500万参数教师蒸馏出3.74％-4.91％。

translated by 谷歌翻译

Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the natural channel of code, i.e., they are primarily concerned with the human understanding of the code. They are not robust to changes in the input and thus, are potentially susceptible to adversarial attacks in the natural channel. We propose, CodeAttack, a simple yet effective black-box attack model that uses code structure to generate effective, efficient, and imperceptible adversarial code samples and demonstrates the vulnerabilities of the state-of-the-art PL models to code-specific adversarial attacks. We evaluate the transferability of CodeAttack on several code-code (translation and repair) and code-NL (summarization) tasks across different programming languages. CodeAttack outperforms state-of-the-art adversarial NLP attack models to achieve the best overall drop in performance while being more efficient, imperceptible, consistent, and fluent. The code can be found at https://github.com/reddy-lab-code-research/CodeAttack.

translated by 谷歌翻译

开放世界对象检测（OWOD）是一个具有挑战性的计算机视觉问题，其中任务是检测一组已知的对象类别，同时识别未知对象。此外，该模型必须逐步学习在下一个培训集中所知的新类。不同于标准对象检测，OWOD设置会对在潜在的未知物体上生成质量候选建议的质量挑战，将未知物体与背景中的未知物体分开并检测不同的未知物体。在这里，我们介绍了一种新的基于端到端的变换器的框架OW-DETR，用于开放世界对象检测。建议的OW-DETR包括三个专用组成部分，即注意力驱动的伪标签，新颖性分类和对象评分，以明确地解决上述OWOD挑战。我们的OW-DETR明确地编码了多尺度上下文信息，具有较少的归纳偏差，使得从已知类传输到未知类，并且可以更好地区分未知对象和背景之间。综合实验是对两个基准进行的：MS-Coco和Pascal VOC。广泛的消融揭示了我们拟议的贡献的优点。此外，我们的模型优于最近引入的OWOD方法矿石，绝对增益在MS-Coco基准测试中的未知召回方面的1.8％至3.3％。在增量对象检测的情况下，OW-DETR以Pascal VOC基准上的所有设置优于最先进的。我们的代码和模型将公开发布。

translated by 谷歌翻译